Search CORE

7 research outputs found

Meta-Learning by the Baldwin Effect

Author: Fernando Chrisantha Thomas
Osindero Simon
Pritzel Alexander
Rusu Andrei A.
Schaul Tom
Sprechmann Pablo
Sygnowski Jakub
Teplyashin Denis
Wang Jane
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 22/06/2018
Field of study

The scope of the Baldwin effect was recently called into question by two papers that closely examined the seminal work of Hinton and Nowlan. To this date there has been no demonstration of its necessity in empirically challenging tasks. Here we show that the Baldwin effect is capable of evolving few-shot supervised and reinforcement learning mechanisms, by shaping the hyperparameters and the initial parameters of deep learning algorithms. Furthermore it can genetically accommodate strong learning biases on the same set of problems as a recent machine learning algorithm called MAML "Model Agnostic Meta-Learning" which uses second-order gradients instead of evolution to learn a set of reference parameters (initial weights) that can allow rapid adaptation to tasks sampled from a distribution. Whilst in simple cases MAML is more data efficient than the Baldwin effect, the Baldwin effect is more general in that it does not require gradients to be backpropagated to the reference parameters or hyperparameters, and permits effectively any number of gradient updates in the inner loop. The Baldwin effect learns strong learning dependent biases, rather than purely genetically accommodating fixed behaviours in a learning independent manner

arXiv.org e-Print Archive

Crossref

An Empirical Study of Implicit Regularization in Deep Offline RL

Author: Doucet Arnaud
Farajtabar Mehrdad
Gulcehre Caglar
Hoffman Matt
Ostrovski Georg
Pascanu Razvan
Srinivasan Srivatsan
Sygnowski Jakub
Publication venue
Publication date: 07/07/2022
Field of study

Deep neural networks are the most commonly used function approximators in offline reinforcement learning. Prior works have shown that neural nets trained with TD-learning and gradient descent can exhibit implicit regularization that can be characterized by under-parameterization of these networks. Specifically, the rank of the penultimate feature layer, also called \textit{effective rank}, has been observed to drastically collapse during the training. In turn, this collapse has been argued to reduce the model's ability to further adapt in later stages of learning, leading to the diminished final performance. Such an association between the effective rank and performance makes effective rank compelling for offline RL, primarily for offline policy evaluation. In this work, we conduct a careful empirical study on the relation between effective rank and performance on three offline RL datasets : bsuite, Atari, and DeepMind lab. We observe that a direct association exists only in restricted settings and disappears in the more extensive hyperparameter sweeps. Also, we empirically identify three phases of learning that explain the impact of implicit regularization on the learning dynamics and found that bootstrapping alone is insufficient to explain the collapse of the effective rank. Further, we show that several other factors could confound the relationship between effective rank and performance and conclude that studying this association under simplistic assumptions could be highly misleading.Comment: 40 pages, 37 figures, 2 table

arXiv.org e-Print Archive